Deep Learning

Deep Learning

  • Downloads:1423
  • Type:Epub+TxT+PDF+Mobi
  • Create Date:2021-03-29 11:20:17
  • Update Date:2025-09-06
  • Status:finish
  • Author:Ian Goodfellow
  • ISBN:0262035618
  • Environment:PC/Android/iPhone/iPad/Kindle

Summary

An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives。

Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts。 Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs。 The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep。 This book introduces a broad range of topics in deep learning。

The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning。 It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames。 Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models。

Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms。 A website offers supplementary material for both readers and instructors。

Download

Reviews

Денис Чакъров

Hard to read but maybe the best book in the field

Igor Oliveira

This book gives an amazing introduction/foundation to Deep Learning。 It is very intuitive, and highly compartmentalized - that is, you can choose the topics you need to read for your understanding, and skip others, which makes the reading much faster and much more useful。

Joel Collier

I’ve just had a tussle with the Goodreads AI, and it’s made me less charitable to machine learning。 Specifically, Goodreads duplicated the record for reading this book and struggled to remove one of the records, and when it did, it deleted my original review。 Stupid AI。Machine learning succeeds because (a) it can rely on an large sample to train the AI, and (b) it can approximate almost any function through deep neural networks。 Once you understand how AI is able to process and categorize experi I’ve just had a tussle with the Goodreads AI, and it’s made me less charitable to machine learning。 Specifically, Goodreads duplicated the record for reading this book and struggled to remove one of the records, and when it did, it deleted my original review。 Stupid AI。Machine learning succeeds because (a) it can rely on an large sample to train the AI, and (b) it can approximate almost any function through deep neural networks。 Once you understand how AI is able to process and categorize experiences, automating tasks like driving or labeling images of rabbits (or bridges, or motorcycles, or whatever) becomes less impressive。 The only way deep learning AI will achieve some sort of singularity is if natural intelligence is easy peasy lemon squeezy。 If cost function minimization achieves global dominance through a deep learning Skynet, it would only show how unimpressive natural intelligence is。 This is the message I took away from Deep Learning。 。。。more

Attila

Review:Main points:(view spoiler)[hard-code knowledge about the world in formal languages - knowledge base approach, none of this was successfulAI systems need the ability to acquire their own knowledge, by extracting patterns from raw data。 This capability is known as machine learningdeep learning is a type of machine learning, a technique that allows computer systems to improve with experience and dataThe performance of these simple machine learning algorithms depends heavily on the representa Review:Main points:(view spoiler)[hard-code knowledge about the world in formal languages - knowledge base approach, none of this was successfulAI systems need the ability to acquire their own knowledge, by extracting patterns from raw data。 This capability is known as machine learningdeep learning is a type of machine learning, a technique that allows computer systems to improve with experience and dataThe performance of these simple machine learning algorithms depends heavily on the representation of the data they are givenOne solution to this problem is to use machine learning to discover not only the mapping from representation to output but also the representation itself。 - representation learningEach piece of information included in the representation of the patient is known as a featureWhen designing features or algorithms for learning features, our goal is usually to separate the factors of variation that explain the observed data。it can be very difficult to extract such high-level, abstract features from raw data。 Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representationsmultiple layers recognize each of the featuressupervised and unsupervised learningSupervised learning algorithms experience a dataset containing features, but each example is also associated with a label or targetThe age of “Big Data” has made machine learning much easier because the key burden of statistical estimation—generalizing well to new data after observing only a small amount of data—has been considerably lightened, there is more datawe have the computational resources to run much larger models todaythe bigger the connection the neurons the more powerful the networkreinforcement learning - an autonomous agent must learn to perform a task by trial and error, without any guidance from the human operatortwo kinds of probabilities frequentist probability if a flip a coin how many times will it be headBayesian probability related to qualitative levels of certainty。 you are 40% to have flu。 In the case of the doctor diagnosing the patient, we use probability to represent a degree of belief, with 1 indicating absolute certainty that the patient has the flu and 0 indicating absolute certainty that the patient does not have the fluInformation theory is a branch of applied mathematics that revolves around quantifying how much information is present in a signalThe basic intuition behind information theory is that learning that an unlikely event has occurred is more informative than learning that a likely event has occurredparameters and weights, bias(hide spoiler)] 。。。more

Darvish

Excellent。 One of the standards in this space。

Rafał Grochala

Sufficiently wide, sometimes surprisingly detailed, very variable level, and not enough visuals。 Shows a bit of age (e。g。 loads of Boltzmann machines), but that's still a solid textbook。 Sufficiently wide, sometimes surprisingly detailed, very variable level, and not enough visuals。 Shows a bit of age (e。g。 loads of Boltzmann machines), but that's still a solid textbook。 。。。more

Alex

After four years of coming back to it I feel confident in considering it "read"。I just wish there would be a new edition to fill in the latest insights, from all the papers in the mean time, so i can reference it a few years longer。 It is ironic and encouraging, at the same time, to go back to chapters lamenting on how little theoretical underpinning certain methods have, despite their success in practice, and to consider how much more we achieved in those directions with the same, or only margi After four years of coming back to it I feel confident in considering it "read"。I just wish there would be a new edition to fill in the latest insights, from all the papers in the mean time, so i can reference it a few years longer。 It is ironic and encouraging, at the same time, to go back to chapters lamenting on how little theoretical underpinning certain methods have, despite their success in practice, and to consider how much more we achieved in those directions with the same, or only marginally more, understanding of the why。 Maybe the detailed and deep understanding of these shortcommings, way back in 2016, is why this book has aged so well。 。。。more

Lily

There have been a lot of advancements in the field since the book was written。 It's also difficult to learn brand new concepts from the book, if you were unfamiliar with them prior to reading。 It does provide an overview of many different topics, though。 There have been a lot of advancements in the field since the book was written。 It's also difficult to learn brand new concepts from the book, if you were unfamiliar with them prior to reading。 It does provide an overview of many different topics, though。 。。。more

Duncan McKinnon

Great reference book for a wide range of topics in ML and deep learning。 Some areas are already becoming a bit outdated as new methods emerge, but it is a good overall mix of fundamental and cutting edge material。

Nati S

I read this book to supplement my master thesis work on deep learning。 Excellent book。

Shehryar Saroya

Andrew Ng's Deep Learning Coursera course does part 2 better。 And part 3 is pretty awful (in terms of exposition, not content)。 Andrew Ng's Deep Learning Coursera course does part 2 better。 And part 3 is pretty awful (in terms of exposition, not content)。 。。。more

Saeed Najafi

The non-research chapters are a very nice summary of the current techniques in deep learning。 the last chapters are too broad and useless。

Ewan

Finally made my way through the bulk of this, has all the fundamentals of the core DL concepts in really good depth along with some of the more specific and recent innovations。 One to continually refer back to。

Dhirajk

Great Book !! Well organized, Concise, and Complete !!This book summarises the vast and complex topic of deep learning in a textbook, by some of the leaders in the field。What has been most valuable is, seeing how it all fits together。There are lots of books, blogs, and videos out there, but this is one of the most comprehensive books on deep learning I've seen till date。 I highly recommend this book。I have also included it in the Top 10 Free Books for Machine Learning and Data Science youtube vi Great Book !! Well organized, Concise, and Complete !!This book summarises the vast and complex topic of deep learning in a textbook, by some of the leaders in the field。What has been most valuable is, seeing how it all fits together。There are lots of books, blogs, and videos out there, but this is one of the most comprehensive books on deep learning I've seen till date。 I highly recommend this book。I have also included it in the Top 10 Free Books for Machine Learning and Data Science youtube video。https://www。youtube。com/watch?v=6uPav。。。 。。。more

Bharath M

What an explanation on the tough topics,its an amazing book recommended both for beginners and advanced deep learning practioners。 Worth the time spent on reading the content。

Kevin Doran

I haven't read this cover-to-cover, so keep that in mind when reading the review。I feel that the authors have not created a learning experience for the reader; they have just written down many things that they know in an organized fashion—organized by how it exists in the landscape of deep learning, not organized in a way conducive for a reader to improve understanding。I gave up trying to learn from this book and switched to other material。 It is now a piece of okay but not great reference mater I haven't read this cover-to-cover, so keep that in mind when reading the review。I feel that the authors have not created a learning experience for the reader; they have just written down many things that they know in an organized fashion—organized by how it exists in the landscape of deep learning, not organized in a way conducive for a reader to improve understanding。I gave up trying to learn from this book and switched to other material。 It is now a piece of okay but not great reference material。The book has 3 sections。 The first is almost pointless, as it covers topics so fast as to achieve almost nothing。 The second section is the best; although opening an editor and writing some code is a better way to appreciate the information presented in the second section。 The third section, which is already quite dated, fails to give me much insight compared to other learning resources (distill。pub, for example)。 Anyone who is thinking of reading this book should read David MacKay's book first; far more insight will be gained from reading that。 。。。more

Yanwei Liu

A little bit hard for beginners who want to learn Deep Learning。Deep Learning by Ian Goodfellow required lots of math knowledge to understand the concept delivered in this book, but if you are very confident in your math, maybe this book will be your best choice。

Arman Behrad

Very comprehensive, from Linear Algebra to Probability theory and Deep learning algorithm。 Such a Bible in it‘s area。

Fabian Zhafransyah

As an undergraduate, I'd say it's pretty difficult to read。 I reread every page that contains an equation at least 4 times, starting from the 4th chapter。 The good thing is it provides a complete theoretical in-and-out explanation about neural networks。 As an undergraduate, I'd say it's pretty difficult to read。 I reread every page that contains an equation at least 4 times, starting from the 4th chapter。 The good thing is it provides a complete theoretical in-and-out explanation about neural networks。 。。。more

Amir Hamed

This is definitely not a book to start learning DL。 You may end up questioning your intellectual abilities in the end, as the book is not easy to understand。 I am not saying that the book is bad, but it should be used as a reference。 It was a mistake to read it while taking DL at school。 The book is dense and explains the topics not so clearly。 The chapters usually end with a literature review on the topic to make it more confusing。 It definitely should not be the first or even the second book t This is definitely not a book to start learning DL。 You may end up questioning your intellectual abilities in the end, as the book is not easy to understand。 I am not saying that the book is bad, but it should be used as a reference。 It was a mistake to read it while taking DL at school。 The book is dense and explains the topics not so clearly。 The chapters usually end with a literature review on the topic to make it more confusing。 It definitely should not be the first or even the second book to read on DL。 。。。more

Dileep

It's a great book for anyone ineterested in the theory behind Deep Learning。 It is divided into three sections。 The first section introduces the mathematical building blocks required for Deep Learning, or in general Machine Learning。 The second part is where Deep Learning is actually discussed, starting from Feed Forward networks, to CNNs, RNNs, and LSTMs。 It also covers a lot on the gradient propagation, and various hurdles that are usually encountered。 The third part talks about some advanced It's a great book for anyone ineterested in the theory behind Deep Learning。 It is divided into three sections。 The first section introduces the mathematical building blocks required for Deep Learning, or in general Machine Learning。 The second part is where Deep Learning is actually discussed, starting from Feed Forward networks, to CNNs, RNNs, and LSTMs。 It also covers a lot on the gradient propagation, and various hurdles that are usually encountered。 The third part talks about some advanced concepts that are supposed to be the cutting edge。 But because of the limitation with print media and these books which are released once a few years, the cutting edge discussed in this book is no longer state of the art。 Still, the topics introduced give a very good mathematical intuition to the reader, which can help someone pick up new papers quickly。 A minor nitpick for me is that there are no Exercises at end of each chapter。 That would have been great to test the understanding of the concepts。 。。。more

Ibrahim Sharaf ElDen

Deep Learning is an excellent book for whom has intermediate knowledge of neural networks, relevant mathematical background, and 1-2 years of working/academic experience and looking to formulate and hone their theoretical notation and background。 This book isn't for absolute beginners, who are looking to start learning about deep learning and neural nets, as the book is heavy on math and theory。 Part I in the book was good, not great, as it covers the fundamental prerequisites for deep learning, Deep Learning is an excellent book for whom has intermediate knowledge of neural networks, relevant mathematical background, and 1-2 years of working/academic experience and looking to formulate and hone their theoretical notation and background。 This book isn't for absolute beginners, who are looking to start learning about deep learning and neural nets, as the book is heavy on math and theory。 Part I in the book was good, not great, as it covers the fundamental prerequisites for deep learning, but sometimes (such as in Linear Algebra chapter), the authors just go on and list the concepts, without stating the motivation or the connection between those concepts and others in deep learning。Part II is my favourite part, elaborate explanations of different fundamental deep learning concepts, the optimisation chapter was outstanding, also the recurrent neural nets one。Part III, which is focusing on frontier research, I liked some bits of it (e。g。, Autoencoders chapter)。 Still, most of it was a bit irrelevant to deep learning research nowadays as it's quite outdated。 A second edition with some updates would be a great addition, sometimes the authors just go on and add lots of math without clear explanation too, which wasn't very convenient。All in all, Deep Learning book is the current best one offering a holistic overview of the field。 However, it can be a bit out-dated, with a bit of a steep learning curve。 。。。more

Ghadah MK

This book is very technical and informative for the deep learning topic。 I’ve actually read it as part of our master’s courses, which explains that it’s not easy book to follow without guidance or prior knowledge in deep learning。 However, the math examples and equations are very well explained on the book。 I can say that this book gives explanation for all the technical part of deep learning, but If someone doesn’t have knowledge about this area, i don’t recommend to start with it。

Eryk Banatt

A good introduction to deep learning, and a useful reference to anyone interested in doing work in the field。 The real strength of this book is part II, where the practical applications of deep learning are given a proper and relatively clear introduction。 Part I is mostly background, some light review of linear algebra / probability / classical ML, probably very skimmable if you have any familiarity with the topics they are discussing。 This book is probably not a good starting point if any part A good introduction to deep learning, and a useful reference to anyone interested in doing work in the field。 The real strength of this book is part II, where the practical applications of deep learning are given a proper and relatively clear introduction。 Part I is mostly background, some light review of linear algebra / probability / classical ML, probably very skimmable if you have any familiarity with the topics they are discussing。 This book is probably not a good starting point if any part of part I is particularly new, save for maybe classical ML, but it’s useful to have as a refresher。Part II is pretty comparable content wise to Andrew ng’s deeplearning。ai course, except none of the math is glossed over / waved away。 This makes it a pretty nice deeper dive for anybody who walked away from that course feeling a little bit like their hand was held through it。Part III I think was a bit of a dense read mostly on graphical models, leading up to the introduction of goodfellow’s most famous work (GANs)。 I admit that this section didn’t interest me much as I don’t really use graphical models in my own work。 This section also doesn’t cover much in the way of modern use of generative models, and things like “evaluating output of generative models” is mostly just mentioned as a hard problem。 Useful for “how do we get here”, not so much for “where do we go from here”。 The chapters on representation learning and autoencoders aren’t bad, though。Overall a good, highly-flashcardable read for someone interested in building a theoretical base for deep learning。 。。。more

Mer

Needless to say, the impact and inherent design of deep learning are inevitable。 Yet, we might want to step back as the public, how accurate is our epistemology towards this happening? Should we see AI as ontology or hyper-object? How fear-driven ethics will interact with technological development? All there are questions to ask from social studies, while AI has become part of social happenings, very interesting。

Emil Petersen

This book has quite a range。 Both in quality and how advanced the chapters are。 Personally I really appreciated the opening chapters that introduce probability theory, linear algebra and other fundamental subjects for deep learning。 The second part on application was the least interesting for me, not because I am not in need of applicative deep learning, but because it was broad and had little coherence other than going under the collective deep learning umbrella。 The final chapters on research This book has quite a range。 Both in quality and how advanced the chapters are。 Personally I really appreciated the opening chapters that introduce probability theory, linear algebra and other fundamental subjects for deep learning。 The second part on application was the least interesting for me, not because I am not in need of applicative deep learning, but because it was broad and had little coherence other than going under the collective deep learning umbrella。 The final chapters on research topics, specifically the chapters on representation learning, inference and probabilistic models, were the most enjoyable。 The book is huge and dense, and it's hard not to appreciate all the effort and skill that is required to make such a work。 It's not the best in terms of explanation or writing, but this can be remedied by a little more effort from the reader。 。。。more

Pavlo

This book is great in a sense that it’s comprehensive and covers multiple topics in the field of deep learning, but the writing style make it so hard to read that the best purpose it can serve is merely a reference to the topics, but you’d go elsewhere to actually learn them。 I would not recommend it。

Yash Patel

Excellent overview of both the fundamental math and more engineering-specific implementation details。 The textbook is divided into three sections, each quite substantial in length:Section 1) Walks through fundamentals and "traditional" ML approaches, i。e。 Naive Bayes, SVMs, PCA, factor analysis, etc。。。 Fairly standard material for a first rigorous course on this sort of material, but it was nice to see all the material presented clearly, both the prose and accompanying math。 The prose was occasi Excellent overview of both the fundamental math and more engineering-specific implementation details。 The textbook is divided into three sections, each quite substantial in length:Section 1) Walks through fundamentals and "traditional" ML approaches, i。e。 Naive Bayes, SVMs, PCA, factor analysis, etc。。。 Fairly standard material for a first rigorous course on this sort of material, but it was nice to see all the material presented clearly, both the prose and accompanying math。 The prose was occasionally long-winded, when a couple diagrams would have sufficed but better to air on the side of over explanation than math terseness。 Section 2) Walked through more "engineering"-y details specific to neural networks, like applications and heuristics that are used in practice。 This is most likely the things people already know if they've used DNNs in practice, but there was an *extremely* interesting section on the equivalence of different classic heuristics (such as dropout and early stopping) to traditional regression regularizations。 Super cool stuff! Most of the implementation details don't really seem too necessary to read in detail, but it serves as a nice reference。Section 3) Some research frontiers: these seemed to greatly overlap with Daphne Koller's book on Probabilistic Graphical Models (about half of the subsections here do)。 That book seems to do a better job, but this was a good overview of the material before jumping into that。 。。。more

Dennis Cahillane

This book is for grad students, advanced practitioners and theoreticians, so I (a hobbyist engineer) was only able to read about the first half。 It works well to gauge your level of understanding of how deep learning is implemented。My big takeaway from the book is that for my purposes I don’t need to understand these implementation details, even if I find them very interesting, because anything worth doing is implemented in libraries and by cloud providers。 A hobbyist engineer like me can run co This book is for grad students, advanced practitioners and theoreticians, so I (a hobbyist engineer) was only able to read about the first half。 It works well to gauge your level of understanding of how deep learning is implemented。My big takeaway from the book is that for my purposes I don’t need to understand these implementation details, even if I find them very interesting, because anything worth doing is implemented in libraries and by cloud providers。 A hobbyist engineer like me can run code and get results using existing implementations。I am prone to go down rabbit holes on projects reading more background and implementation details than necessary so I stopped grinding through this book。 I was able to train an AI that plays the SNES game Kirby’s Avalanche and make a YouTube video about it anyway! 。。。more

Hamish Seamus

I wasn't able to follow beyond about half way。* Following the success of back-propagation, neural network research gained popularity and reached a peak in the early 1990s。 Afterwards, other machine learningtechniques became more popular until the modern deep learning renaissance that began in 2006。* Regularization of an estimator works by trading increased bias for reduced variance。 * in neural networks, typically only the weight and not the biases are used in normalisation penalties* Effect of I wasn't able to follow beyond about half way。* Following the success of back-propagation, neural network research gained popularity and reached a peak in the early 1990s。 Afterwards, other machine learningtechniques became more popular until the modern deep learning renaissance that began in 2006。* Regularization of an estimator works by trading increased bias for reduced variance。 * in neural networks, typically only the weight and not the biases are used in normalisation penalties* Effect of weight decay: small eigenvalue directions of the Hessian are reduced more than large eigenvalues。 * Linear hidden units can be useful: if rather than g(Wx+b) we have g(UVx+b) then we have effectively factorised W, which can save parameters (at cost of constraining W to lower rank)。 * "The softplus demonstrates that the performance of hidden unit types canbe very counterintuitive—one might expect it to have an advantage overthe rectifier due to being differentiable everywhere or due to saturating lesscompletely, but empirically it does not。" * Hard tanh * L2 regularisation is comes from Gaussian prior over weights + MAP * L1 regularisation comes from Laplacian prior * L-norms are equivalent to constrained optimisation problems: constraining to an L-n ball whose radius depends on the form of the loss * With early stopping, after you've finished on the training set, you can now also train on the validation data * You can either train again from scratch with the val data added in * You can do the same number of parameter updates * Or same number of passes through the data * Or also train on the validation data after the first round of training * Perhaps until the objective function on the validation set reaches the same level as the training set * Early stopping is in a sense equivalent to L2 regularization in that it limits the length of the optimisation trajectory。 It is superior in that it automatically fine tunes the eta hyperparameter1 * Bagging = bootstrap aggregating * Dropout: Typically, an input unit is included with probability 0。8, and a hidden unit is included with probability 0。5。 * Although the extra cost of dropout per step is negligable, it does require longer training and a larger model。 If the dataset is large then this probably isn't worthwhile。 * Wang and Manning (2013) showed that deterministic dropout can converge faster * Regularising noise has to be multiplicative rather than additive because otherwise the outputs could just be made very large * Virtual adversarial training: generate example x which is far from any real examples and make sure that the model is smooth around x * Smaller batch sizes are better for regularisation, but often underutilise hardware * Second order optimisation techniques like Newtons method appear to have not taken off due to them getting stuck in saddle points * Cliffs are common in objective landscapes because of "a multiplication of many factors" * Exploding/vanishing gradient illustration: if you are multiplying by M repeatedly, then eigenvalues > 1 will explode and < 1 will vanish。 This can cause cliffs。 * A sufficient condition for convergence of SGD is $\sum_{k=1}^\infty \epsilon_k = \infty$ and $\sum_{k=1}^\infty \epsilon_k^2 < \infty$ * Often a learning rate is decayed such that $\epsilon_t = (1-\alpha) \epsilon_{t-1} + \alpha \epsilon_\tau * Newton's method isn't commonly used because calculating the inverse Hessian is O(k^n) with number of parameters * coordinate descent: optimise one parameter at a time * block coordinate descent: optimise a subset of parameters at a time * Three ways to decide the length of a output sequence of a RNN: * Have an token * Have a binary output saying whether the sequence is done or not * Have a countdown to the end as one of the outputs * Reservoir computing: hard code the weights for the recurrent and input connections, only train output * You can only accept predictions above a given confidence。 Then the metric you might use is coverage。 * If the error rate on the training set is low, try to get this up by increasing model size, more data, better hyperparameter, etc。 * If the error rate on the test set is low, try adding regularizers, more data, etc。 * Hyperparameter vs loss tends to be U-shaped * One common learning rate regime is to wait for plateaus then reduce the learning rate 2-10x * Learning rate is probably the most important hyperparameter * Learning rate vs training error: apparently exponential decay then a sharp jump upwards (p。 430) * In conv nets, there are three schemes (TODO: look up names): * no padding * pad enough to preserve image size * pad enough so that all pixels get equal convolutions (increase image size) * Grid hyperparameter search is normally iterative: if you search [-1, 0, 1] and find 1 is the best, then you should expand the window to the right * To debug: look at most confident mistakes * Test for bprop bug: manually estimate the gradient (see p。 439) * Monitor outputs/entropy, activations (for dead units), updates to weight magnitude ratio (should be ~1%) * Your metric can be coverage: hold accuracy constant and try improve coverage * Groups of GPU threads are called warps * Mixture of experts: one model predicts weighting of expert predictions。 * Hard mixture of experts: one model chooses a single expert predictor * Combinatorial gaters: choose a subset of experts to vote * Switch: model receives subset of inputs (similar to attention) * To save on compute use cascades: use a cheap model for most instances, and an expensive model when some tricky feature is present。 Use a cascade of models to detect the tricky feature: the first has high recall, the last has high precision。 * Common preprocessing step is to make each pixel have mean zero and std one。 But for low-information pixels this may just amplify noise or compression artefacts。 So you want to add a regularizer (p。 455) * Image preprocessing: * sphere = whitening * GCN = global contrast normalisation (whole image has mean zero and std 1) * LCN = local contrast normalisation (each window / kernel has mean zero nad std 1) * Rather than a binary tree for hierarchical softmax, you can just have a breadth-sqrt(n) and depth 2 tree * ICA is used in EEG to separate the signal from the brain from the signal from the heart and the breath * Recirculation: autoencoder to match layer-1 activations of original input with reconstructed input * Under-complete autoencoders have lower representational power in hidden space than the original space * Over-compelte autoencoders have at least as much representational power in hidden space as original space * Denoising autoencoder: L(x, g(f(x + epsilon)) ) * CAE = contractive autoencoder: penalises derivatives (so it doesn't change much with small changes in x) * score matching; try and get the same abla_x \log p(x) for all x * Rifai 2011a is where the iterative audoencoder comes from * Autoencoder failure mode: f can simply multiply by epsilon, and g divide by epsilon, and thereby achieve perfect reconstruction and low contractive penalty * Semantic hashing: use autoencoder to create a hash of instances to help with information retrieval。 If the last layer of the autoencoder is softmax, you can force it to saturate at 0 and 1 by adding noise just before the softmax, so it will have to push further towards 0 and 1 to let the signal get through * Denoising autoencoders learn a vector field towards the instance manifold * Predictive sparse decomposition is a thing * Autoencoders can perform better than PCA at reconstruction * I didn't understand much of linear factor models * Probabilistic PCA * Slow feature analysis * independent component analysis * You can coerce a representation that suits your task better * E。g。, for a density estimation task you can encourage independence between components of hidden layer h * Greedy layerwise unsupervised pretraining goes back to the neocognitron, and it was the discovery that you can use this pretraining to get a fully connected network to work properly that sparked the 2006 DL renaissance * Pretraining makes use of two ideas: * parameter initial conditions can have a regularisation effect * learning about input distribution can help with prediction * Unsupervised pretraining has basically been abandoned except for word embedding。 * For moderate and large datasets simply supervised learning works better * For small datasets Bayesian learning works better * What makes good features? Perhaps if you can capture the underlying (uncorrelated) causes these, these would be good features。 * Distributed representations are much more expressive than non-distributive ones * Radford 2015 does vector arithmetic with images * While NFL theorems mean that there's no universal prior or regularisation advantage, we can choose some which provide an advantage in a range of tasks which we are interested in。 Perhaps priors similar to those humans or animals have。 * To calculate probabilities in undirected graphs (e。g。, modelling sickness between you, your coworker, your roommate) take the product of the "clique potentials" for each clique (and normalise)。 The distribution over clique products is a "gibbs distribution" * The normalising Z is known as the partition function (statistical physics terminology) * "d-separation": in graphical models context, this stands for "dependence separation" and means that there's no flow of information from node set A to node set B * Any relationship structure can be modelled with directed or undirected graphs。 The value of these is that they eliminate dependencies。 * "immorality": a collider structure in a directed graph。 To convert this to an undirected graph, it needs to be "moralised" by connecting the two incoming nodes (creating a triangle)。 The terminology comes from a joke about unmarried parents。 * In undirected graphs a loop of length greater than 3 without chords needs to be cut up into triangles before it can be represented as a directed graph * To sample from a directed graph, sample from each node in topographical order * Restricted Boltzmann Machine = Harmonium * It consists of one fully connected hidden layer, and one non-hidden layer * The "restricted" means that there's no connections between hidden layers * Continuous Markov chain = Harris chain * Perron-Frobenius Theorem: for a transition matrix, if there are no zero-probability transitions, then there will be a eigenvalue of one。 * Running a Markov chain to reach equilibrium distribution is called "burning in" the Markov chain。 * Sampling methods like GIbbs sampling can get stuck in one mode。 Tempered transitions means reducing the temperature of the transition function between samples to make it easier to jump between modes * There's two kinds of sampling: Las Vegas sampling which always either returns the correct answer or gives up, and Las Vegas sampling, which will always return an answer, but with a random amount of error * You can decompose sampling into a positive phase and a negative phase: * $abla_\theta \log p(x;\theta) = abla_\theta \log \tilde{p}(x;\theta) + abla_\theta \logZ()\theta) $ * I'm not understanding a ton of the chapter on the partition function。 * Biology vs back prop: If brains are implementing back prop, then there needs to be a secondary mechanism to the usual axon activation。 * Hinton 2007a, Bengio 2015 have proposed biologically plausible mechanisms。 * Dreams may be sampling from the brains model during negative phase learning (Crick and Mitchison, 1983) * Ie, "approximate the negative gradient of the log partition function of undirected models。" * It could also be about sampling p(v,h) for inference (see p。 651) * Or may be about reinforcement learning * Most important generative architectures: * RBM: restricted boltzmann machine, bipartate & undirected * DBN: deep belief network, RBM plus a feedforward layer to the sensors * DBM: deep Boltzmann machine, stack of RBMs * RBMs are bad for computer vision because it's hard to expression max pooling in energy functions。 * Also, the partition function changes with different sizes on input * Also doesn't work well with boundaries * There are lots of problems with evaluating generative models。 * One approach is blind taste testing。 * To prevent the model from memorising, also display the nearest neighbour in the dataset * Some models good at maximising likelihood of good examples, others good at minimising likelihood of bad examples * Depends on the direction of the KL divergence * Theis et al。 (2015) reviews issues with generative models 。。。more